Univariate Plots Section

Summarize the data set data_eu.

The data_eu has 1638 objectives and 12 variables.

## [1] 1638   12

The variables in data_eu are:

##  [1] "Country"         "Year"            "Gender"         
##  [4] "BMI_Index"       "Bloodpressure"   "Cholesterol"    
##  [7] "Sugar"           "Food"            "Income"         
## [10] "Life_expectancy" "Region"          "Period"
## 'data.frame':    1638 obs. of  12 variables:
##  $ Country        : Factor w/ 40 levels "Albania","Armenia",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Year           : Factor w/ 25 levels "1980","1981",..: 1 1 2 2 3 3 4 4 5 5 ...
##  $ Gender         : Factor w/ 2 levels "female","male": 2 1 2 1 1 2 2 1 2 1 ...
##  $ BMI_Index      : num  25.2 25.2 25.2 25.2 25.2 ...
##  $ Bloodpressure  : num  133 132 133 132 132 ...
##  $ Cholesterol    : num  5.01 5.04 5 5.04 5.03 ...
##  $ Sugar          : num  46.6 46.6 46.6 46.6 46.6 ...
##  $ Food           : num  2660 2660 2748 2748 2692 ...
##  $ Income         : int  4218 4218 4227 4227 4237 4237 4248 4248 4259 4259 ...
##  $ Life_expectancy: num  70.9 70.9 71 71 71 71 71 71 71.5 71.5 ...
##  $ Region         : Factor w/ 4 levels "Eastern Europe",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Period         : Factor w/ 3 levels "1980 - 1989",..: 1 1 1 1 1 1 1 1 1 1 ...

The variable Country contains information for forty different European countries.

## [1] 40

These are:

##  [1] "Albania"                "Armenia"               
##  [3] "Austria"                "Azerbaijan"            
##  [5] "Belarus"                "Belgium"               
##  [7] "Bosnia and Herzegovina" "Bulgaria"              
##  [9] "Croatia"                "Cyprus"                
## [11] "Denmark"                "Estonia"               
## [13] "Finland"                "France"                
## [15] "Georgia"                "Germany"               
## [17] "Greece"                 "Hungary"               
## [19] "Iceland"                "Ireland"               
## [21] "Italy"                  "Kazakhstan"            
## [23] "Latvia"                 "Macedonia, FYR"        
## [25] "Malta"                  "Moldova"               
## [27] "Netherlands"            "Norway"                
## [29] "Poland"                 "Portugal"              
## [31] "Romania"                "Russia"                
## [33] "Slovak Republic"        "Slovenia"              
## [35] "Spain"                  "Sweden"                
## [37] "Switzerland"            "Turkey"                
## [39] "Ukraine"                "United Kingdom"

The variable Year contains information for 25 years. The levels of variable Year are:

##  [1] "1980" "1981" "1982" "1983" "1984" "1985" "1986" "1987" "1988" "1989"
## [11] "1990" "1991" "1992" "1993" "1994" "1995" "1996" "1997" "1998" "1999"
## [21] "2000" "2001" "2002" "2003" "2004"

The variable Gender has the following levels.

## [1] "female" "male"

The variable Region has 4 levels. These are:

## [1] "Eastern Europe"  "Northern Europe" "Southern Europe" "Western Europe"

The variable Period has 3 levels. These are:

## [1] "1980 - 1989" "1990 - 1999" "2000 - 2004"

The summary of the data frame is shown below:

##      Country          Year         Gender      BMI_Index    
##  Albania :  50   1993   :  80   female:819   Min.   :23.38  
##  Austria :  50   1994   :  80   male  :819   1st Qu.:24.88  
##  Belgium :  50   1995   :  80                Median :25.33  
##  Bulgaria:  50   1996   :  80                Mean   :25.37  
##  Cyprus  :  50   1997   :  80                3rd Qu.:25.85  
##  Denmark :  50   1998   :  80                Max.   :28.01  
##  (Other) :1338   (Other):1158                               
##  Bloodpressure    Cholesterol        Sugar             Food     
##  Min.   :120.0   Min.   :4.502   Min.   :  5.48   Min.   :1570  
##  1st Qu.:129.7   1st Qu.:5.154   1st Qu.: 82.19   1st Qu.:2981  
##  Median :132.2   Median :5.365   Median :106.85   Median :3235  
##  Mean   :132.0   Mean   :5.385   Mean   :102.22   Mean   :3180  
##  3rd Qu.:134.9   3rd Qu.:5.658   3rd Qu.:123.29   3rd Qu.:3441  
##  Max.   :143.1   Max.   :6.241   Max.   :167.12   Max.   :3817  
##                                                                 
##      Income      Life_expectancy             Region            Period   
##  Min.   : 1466   Min.   :62.70   Eastern Europe :638   1980 - 1989:500  
##  1st Qu.:10858   1st Qu.:71.20   Northern Europe:250   1990 - 1999:738  
##  Median :21216   Median :75.00   Southern Europe:350   2000 - 2004:400  
##  Mean   :21580   Mean   :74.15   Western Europe :400                    
##  3rd Qu.:30785   3rd Qu.:77.30                                          
##  Max.   :62370   Max.   :81.10                                          
## 

The data frame data_eu contains 40 different European countries. The period which the data data_eu is gathered is from 1980 until 2004, that means 25 years and for both gender female and male. Roughly 75% have an BMI of more than 25, which actually is overweight. In average people have in all countries systolic blood above the recommend 120 mm Hg. The mean value of cholesterol is 5.385 mmol/L, age standardized mean, and indicates high risk for heart diseases. The average of sugar consumption per day and per person is 102.22 grams. The average daily kilocalorie consumption per person is 3180. The maximum average age is 81.1 years. The half of all people earn $21216 per year.

Life expectancy

  • The distribution is negative skewed. The mean value is 74.15 and median value is 75.
  • The life expectancy of 76 has the highest frequency.

Which country has the lowest life expectancy in Europe?

##      Country Year Gender BMI_Index Bloodpressure Cholesterol Sugar    Food
## 1513  Turkey 1980 female  26.06155      127.1133    4.867945 65.75 3277.66
## 1514  Turkey 1980   male  23.66064      125.8545    4.812084 65.75 3277.66
##      Income Life_expectancy          Region      Period
## 1513   7828            62.7 Southern Europe 1980 - 1989
## 1514   7828            62.7 Southern Europe 1980 - 1989

It is Turkey in 1980.

Which country has the highest life expectancy in Europe?

##     Country Year Gender BMI_Index Bloodpressure Cholesterol  Sugar    Food
## 781 Iceland 2004   male  26.73403      129.7408    5.737782 153.43 3310.98
## 782 Iceland 2004 female  25.67006      119.9665    5.631419 153.43 3310.98
##     Income Life_expectancy          Region      Period
## 781  37482            81.1 Northern Europe 2000 - 2004
## 782  37482            81.1 Northern Europe 2000 - 2004

It is Iceland in 2004. The results of life expectancy summary are shown below.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   62.70   71.20   75.00   74.15   77.30   81.10

Boxplot life expectancy by country between 1980-2004

But which countries have in average lower life expectancy than than the average life expectancy of all Europeans which is 74.14884 between 1980-2004?

## Source: local data frame [19 x 2]
## 
##                   Country    meanL
##                    (fctr)    (dbl)
## 1              Kazakhstan 63.87692
## 2                  Russia 65.63846
## 3              Azerbaijan 67.06154
## 4                 Ukraine 67.46154
## 5                  Turkey 67.99600
## 6                 Belarus 68.35385
## 7                 Moldova 68.43077
## 8                  Latvia 68.88462
## 9                 Estonia 69.54615
## 10                Romania 70.02000
## 11                Hungary 70.23600
## 12                Armenia 70.60000
## 13               Bulgaria 71.39200
## 14                Georgia 71.56154
## 15                 Poland 72.06800
## 16 Bosnia and Herzegovina 72.91538
## 17        Slovak Republic 73.09167
## 18                Albania 73.14000
## 19         Macedonia, FYR 73.59231

It is interesting to see that all countries except of Turkey are Eastern European countries.

In the next step the distribution of Cholesterol has to be presented.

The histogram above looks like a normal distribution.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.502   5.154   5.365   5.385   5.658   6.241

There is no big difference between mean value (5.385) and median value (5.365). Only 25% of people have normal cholesterol values.

I wonder which country has the highest values.

##             Country Year Gender BMI_Index Bloodpressure Cholesterol  Sugar
## 1589 United Kingdom 1980   male  24.72216      136.5205    6.240528 117.81
## 1591 United Kingdom 1981   male  24.78911      136.3343    6.201839 120.55
##         Food Income Life_expectancy         Region      Period
## 1589 3116.05  20417            73.4 Western Europe 1980 - 1989
## 1591 3099.19  20149            73.8 Western Europe 1980 - 1989

Male in United Kingdom between 1980-1981 have the highest cholesterol value, which is 6.240528.

Which country has the lowest cholesterol value?

##        Country Year Gender BMI_Index Bloodpressure Cholesterol Sugar
## 151 Azerbaijan 2004   male  24.89376      131.3845    4.501741 43.84
##       Food Income Life_expectancy         Region      Period
## 151 2894.6   6435            69.3 Eastern Europe 2000 - 2004

It is Azerbaijan in 2004.

Boxplot Cholesterol vs. Country

Which countries have normal cholesterol values?

## Source: local data frame [14 x 2]
## 
##                   Country    meanL
##                    (fctr)    (dbl)
## 1              Azerbaijan 4.707317
## 2  Bosnia and Herzegovina 4.710209
## 3                 Georgia 4.759566
## 4                 Armenia 4.803656
## 5                 Moldova 4.820544
## 6                  Turkey 4.838926
## 7                 Albania 4.952450
## 8              Kazakhstan 4.958597
## 9          Macedonia, FYR 4.970246
## 10                Ukraine 5.032866
## 11                Croatia 5.130145
## 12                Romania 5.145889
## 13                 Russia 5.171549
## 14                Belarus 5.190451

Except of Turkey all countries with normal cholesterol values are Eastern European countries.

Food

The histogram of variable Food is shown below.

The distribution is negative skewed with the highest calorie frequency at 3400.

The summary for Food can be seen below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1570    2981    3235    3180    3441    3817

The data set data_eu does not make any distinction between female and male, age, height, weight and activity of each person regarding the daily need of food supply. The average energy supply for females and males are the same. According to the summary 50% of people have a daily intake up to 3234.94 calories per day and on average 3179.7098779 calories which is high if one thinks that the average amount of calories for both sex is 2400 calories (men: 2700 calories, women: 2100 calories, average: 2400 calories).

The energy intake by year shows the next boxplot.

The average intake of food energy (red dots) is over the recommended for all years. But how does the intake of food energy looks by country?

Which European countries has normal average in calorie consumption?

## Source: local data frame [3 x 2]
## 
##      Country    meanF
##       (fctr)    (dbl)
## 1    Armenia 2073.727
## 2    Georgia 2306.341
## 3 Azerbaijan 2363.422

Sugar

The distribution is negative skewed. The highest frequency is around 120 grams per day and person. Which actually is higher than the recommended value.

Below the summary of Sugar.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.48   82.19  106.80  102.20  123.30  167.10

The median is 106.8 and the mean is 102.2. A healthy sugar consumption should be 5% of the total energy intake per person, that means 120 calories per person per day. Which is equivalent to 31 grams per day.

In which country is the sugar consumption 5.48 grams or 167.12 calories per day?

##   Country Year  Sugar Life_expectancy
## 1 Armenia 1993   5.48            69.2
## 2 Estonia 2003 167.12            71.4
## 3 Estonia 2004 167.12            71.9

Armenia and Estonia were the countries with the lowest respectively highest sugar consumption in Europe.

BMI

Is a positive skewed distribution. The highest peak shows that people have a BMI over 25 which means overweight.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   23.38   24.88   25.33   25.37   25.85   28.01

50% of the people have light overweight. Only 25% of all people in Europe have a normal valued BMI.

Between 1980-2004 the BMI index has increased.

Systolic Blood Pressure

The histogram of the Systolic blood pressure shows that the distribution is negative skewed. 50% of the people in Europe has a systolic blood pressure 132 mm Hg (hypertensive crisis).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   120.0   129.7   132.2   132.0   134.9   143.1

The summary confirmed this. The mean and median values are 132 mm Hg respective 132.2 mm Hg.

The likelihood that people get blood pressure values around 130 is very high.

How is the development of systolic blood pressure from 1980 until 2004?

From 1980 until 1991 the blood pressure is decreasing. In 1992 for first time a small increase in blood pressure is noticed. Thereafter again an decrease until 2004. Is the usage of medicine the reason of decreasing the systolic blood pressure?

How does it look for each country during the period 1980-2004?

The majority of countries have high systolic blood pressure. The red dot denotes the average value.

But which countries have a normal systolic blood pressure?

##   Country
## 1 Iceland

Iceland is the only country with normal systolic blood pressure.

Which countries have an average systolic blood pressure greater than 140 mm Hg?

##   Country
## 1 Finland
## 2 Germany
## 3  Norway

Income

The next plot shows the histogram of variable Income.

The distribution of the income per person is skewed. Obviously there is an inequality of income. Which actually was expected due to difference in development of countries. Political, geographical and economical factors have also an impact on income.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1466   10860   21220   21580   30780   62370

Next I look closer to the outliers. I am interesting in which countries have a income of $1466 per year and person in average.

## Source: local data frame [2 x 12]
## Groups: Country [1]
## 
##                  Country   Year Gender BMI_Index Bloodpressure Cholesterol
##                   (fctr) (fctr) (fctr)     (dbl)         (dbl)       (dbl)
## 1 Bosnia and Herzegovina   1993 female  25.20537      132.8245    4.808685
## 2 Bosnia and Herzegovina   1993   male  25.17015      131.9561    4.711708
## Variables not shown: Sugar (dbl), Food (dbl), Income (int),
##   Life_expectancy (dbl), Region (fctr), Period (fctr)

It was Bosnia and Herzegovina in 1993. This can be explained by the Bosnian war between 1992-1995.

Which country has the highest income per person per year in Europe?

## Source: local data frame [2 x 12]
## Groups: Country [1]
## 
##   Country   Year Gender BMI_Index Bloodpressure Cholesterol  Sugar    Food
##    (fctr) (fctr) (fctr)     (dbl)         (dbl)       (dbl)  (dbl)   (dbl)
## 1  Norway   2004 female  25.47340      127.2539    5.373211 120.55 3458.47
## 2  Norway   2004   male  26.50614      134.8698    5.456786 120.55 3458.47
## Variables not shown: Income (int), Life_expectancy (dbl), Region (fctr),
##   Period (fctr)

Norway is the country with the highest income in Europe. It was expected because Norway is export country of oil and gas.

How do the various countries score regarding income?

Countries in Eastern Europe have the lowest income among the European countries, followed by countries in Southern Europe. In Western and Northern Europe the income is high, highest is in Norway.

Univariate Analysis

What is the structure of your dataset?

There are 1638 observations in the data set data_eu with 12 features. These are: Country, Year, Gender, BMI Index, Bloodpressure, Cholesterol, Sugar, Food, Income, Life_expectancy Region and Period. The variables Country, Year, Gender, Region and Period are ordered factor variables with the following levels.

Country is ordered alphabetic and have 40 levels: From Albania to United Kingdom

Year is ordered and have 25 levels: Year is between 1980-2004

Gender have two levels: Male and Female

  • 75% have an BMI of greater than 25 which means overweight
  • 50% of all people have a hypertensive crisis systolic blood pressure 132.2 (mm Hg)
  • The mean value of Cholesterol is 5.385 which is already on the borderline to high risk.
  • The average of sugar consumption per day and per person is 102.22 grams
  • The average daily calorie intake per person is 3180
  • The maximum average age is 81.10 years and the minimum 62.70
  • 50% of all people earn $21216 per year

Region has four levels: Eastern Europe, Northern Europe, Southern Europe, and Western Europe

Period has three levels: 1980 - 1989, 1990 - 1999 and 2000 - 2004

What is/are the main feature(s) of interest in your dataset?

The main features in the data set data_eu are Life_expectancy, Cholesterol, Bloodpressure and Income. I think that systolic blood pressure and cholesterol play an important role in human’s health and therefore on life expectancy. On the other hand I think that income has a positive impact on life expectancy since it enables a better health care and hence a longer life.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Other features which can help the investigation is Food, Sugar, BMI_Index, Region and Country. The daily amount of calories that a person intakes and the amount of sugar are important factors for the weight. One indicator of a person’s health is the BMI. Region divides Europe not only geographical but also economical that means depended on which region people live increase/decrease the chance to prolong the life. On the other hand Country is also a factor which impacts people’s health. If a country has a sophisticated health system and its population has equal access to the public health care services then the life expectancy is also higher.

Did you create any new variables from existing variables in the dataset?

Yes, I did. I added into the data set the variable Region to order the European countries by region, since region is not only a geographical distinction but also indicates a different living affluence. I also added into the data set the variable Period in order to investigate the change of life expectancy over time.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

The data set Cholesterol is a merging data set of Cholesterol data set for male and female respective. The data set Systolic Blood Pressure is a merging data set of Systolic Blood Pressure data set for male and female respective. The data set BMI is a merging data set of BMI data set for male and female respective.

The remaining data sets do not distinct between male and female, they have one value for both gender.

Bivariate Plots Section

Life_expectancy vs. Cholesterol

The first scatterplot shows the relationship between life expectancy vs. cholesterol in Europe between 1980-2004.

As older people get, as higher are their cholesterol values. This is true until the cholesterol of around 5.5 mmol/L, age standardized mean. Then the life expectancy decreases as cholesterol get higher. The scatterplot shows also that people live shorter even when they do not have high cholesterol values. This is contrary to the proclaimed statement that cholesterol should be below 5.2 mmol/L, age standardized mean.

The quantiles of data_eu for 10%, 50% and 90% are represented by dark green, red and purple dashed line respective. The mean is shown by a black dashed line. 10%, 50% and 90% of people with cholesterol value below 5.2, which corresponds to normal cholesterol, have a life expectancy under 68.2, 71.8 and 79.4 years respectively. But looking at the group of people with cholesterol values equal or greater than 5.2, which means borderline high risk and high risk for many diseases, the life expectancy is under 75.4, 77.6 and 80.1 respective.

The results show that in general the group of people with high cholesterol lives longer than people with normal cholesterol values.

The correlation of Life_expectancy and Cholesterolis 0.52.

## [1] 0.5206328

The variables Life_expectancy and Cholesterol are moderately related.

Next I look closer at life expectancy by region and country.

People in Eastern Europe have the lowest in average life expectancy in Europe.

People in Kazakhstan have in average the lowest life expectancy in Europe.

Below is mean value of life expectancy in Eastern, Southern, Western and Northern Europe.

## [1] 70.55423
## [1] 75.77543
## [1] 76.5725
## [1] 77.1672

The median value of life expectancy in Eastern, Southern, Western and Northern Europe are:

## [1] 70.8
## [1] 76.8
## [1] 76.65
## [1] 77.2

The life expectancy is in average lowest in Eastern Europe. Followed by Southern and Western Europe. The highest in average life expectancy has the Northern Europe. The median life expectancy in Eastern Europe is the lowest in Europe, followed by Western, Southern, and Northern Europe.

The cholesterol is in average under the recommended value of 5.2 mmol/L, age standardized mean in Eastern Europe. The boxplot below visualizes this.

Below is the mean cholesterol values in Eastern, Southern, Western and Northern Europe.

## [1] 5.082286
## [1] 5.323818
## [1] 5.700119
## [1] 5.736535

The median cholesterol values in Eastern, Southern, Western and Northern Europe:

## [1] 5.129331
## [1] 5.357579
## [1] 5.705645
## [1] 5.751593

The cholesterol value in Eastern Europe is in average the lowest of all other regions. Followed by Southern, Western and Northern Europe. The highest cholesterol value in average has the Northern Europe.

Eastern Europe has in average the lowest life expectancy and the lowest cholesterol value. Followed by Southern, Western and Northern Europe. Northern Europe has in average the highest life expectancy and the highest in average cholesterol value. But this is contrary to what it was expected.

Life expectancy vs. Systolic Blood Pressure

The first scatterplot shows the relationship between life expectancy vs. systolic blood pressure.

Few countries have desired systolic blood pressure value.

##     Country Year Gender BMI_Index Bloodpressure Cholesterol  Sugar    Food
## 782 Iceland 2004 female  25.67006      119.9665    5.631419 153.43 3310.98
##     Income Life_expectancy          Region      Period
## 782  37482            81.1 Northern Europe 2000 - 2004

Only women had desired systolic blood pressure in Iceland in 2004. The most European countries have prehypertension systolic blood pressure. With a systolic blood pressure greater than 134 mm Hg the life expectancy decreases.

There are few data values with systolic blood pressure equal or less than 126 mm Hg. I make some adjustments in the x-axis and focus on this area.

At 120 mm Hg the maximum life expectancy is reached. Thereafter life expectancy decreases as systolic blood pressure increases.

As next step I look closer the data_eu but for systolic values between 126 mm Hg and 139 mm Hg, because the most data is concentrated in this interval.

Only 10% of the Europeans have a life expectancy below 70 years, 50% of Europeans have a life expectancy below 77.6 and 95% below 80 years. The quantile 95%, 50% and the mean value decreases as the systolic blood pressure increases. But contrary, the quantile of 10% increases as the systolic blood pressure increases.

Create a subset of 0.1 quantile, that means systolic blood pressure between 127 mm Hg-139 mm Hg and life expectancy below 70 years. Thereafter I create boxplot Country vs. Life_expectancy.

##                   Country
## 1                 Armenia
## 2              Azerbaijan
## 3                 Belarus
## 4  Bosnia and Herzegovina
## 5                 Estonia
## 6                 Georgia
## 7                 Hungary
## 8              Kazakhstan
## 9                  Latvia
## 10                Moldova
## 11                Romania
## 12                 Russia
## 13                 Turkey
## 14                Ukraine

All countries except of Turkey are Eastern European countries. This can also been shown below with a boxplot.

Characteristic of this data set is that the information is taken from 1992 except in case of Turkey and Hungary which is taken from 1980.

But what about the life expectancy with systolic blood pressure greater than 139 mm Hg?

Some few countries have a systolic blood pressure greater than 139 mm Hg. These are Finland, Norway, Ireland, Germany and Hungary between 1980-1988.

##   Country
## 1 Finland
## 2 Germany
## 3 Hungary
## 4 Ireland
## 5  Norway

Life Expectancy vs. Income

The scatterplot shows the relationship between life expectancy and income. That could be interpreted as high income prolongs life especially until the life expectancy of 76. Thereafter the life expectancy grows much slower than the increase of income.

The correlation between Life_expectancyand Income is 0.76, which is strong.

The mean and median of income in each European region are shown below. The mean income values in Eastern, Southern, Western, and Northern Europe are:

## [1] 9826.147
## [1] 21548.51
## [1] 32402.58
## [1] 34303.06

The median income values in Eastern, Southern, Western, and Northern Europe are:

## [1] 9794
## [1] 21442
## [1] 31854
## [1] 32297

Eastern Europe has the lowest in average and median income, followed by Southern, Western and Northern Europe.

## Source: local data frame [20 x 4]
## 
##                   Country mean_income median_income     n
##                    (fctr)       (dbl)         (dbl) (int)
## 1                 Moldova    2757.615          2596    26
## 2                 Armenia    2824.846          2636    26
## 3                 Georgia    3152.231          3185    26
## 4                 Albania    4440.640          4281    50
## 5              Azerbaijan    4610.077          4459    26
## 6  Bosnia and Herzegovina    4676.692          5609    26
## 7                 Ukraine    5684.846          5305    26
## 8                 Belarus    7061.385          6879    26
## 9          Macedonia, FYR    8154.385          8229    26
## 10               Bulgaria    9666.160         10088    50
## 11             Kazakhstan   10126.077          9706    26
## 12                 Latvia   10360.846          9912    26
## 13                Romania   11751.800         11449    50
## 14                 Poland   11944.320         11212    50
## 15                 Russia   13495.538         13173    26
## 16                Estonia   13682.923         13705    26
## 17                Croatia   14689.385         14652    26
## 18        Slovak Republic   14860.000         15062    24
## 19                Hungary   16988.200         16989    50
## 20               Slovenia   20757.231         20585    26
## Source: local data frame [5 x 4]
## 
##   Country mean_income median_income     n
##    (fctr)       (dbl)         (dbl) (int)
## 1 Finland    28341.84         27282    50
## 2 Iceland    29179.64         28629    50
## 3  Sweden    31473.16         30596    50
## 4 Denmark    34966.28         34008    50
## 5  Norway    47554.40         45742    50

Moldova has the lowest in average and median income in Eastern Europe and Norway the highest ones.

Before I investigate other features I want to look closer at the correlations between them.

##                   BMI_Index Bloodpressure Cholesterol       Sugar
## BMI_Index        1.00000000    0.13814532  -0.1843664 -0.02879355
## Bloodpressure    0.13814532    1.00000000   0.1630146 -0.07668586
## Cholesterol     -0.18436641    0.16301460   1.0000000  0.59618776
## Sugar           -0.02879355   -0.07668586   0.5961878  1.00000000
## Food             0.07789474   -0.15094300   0.4399585  0.49576125
## Income          -0.07565821   -0.18894575   0.6348446  0.56375438
## Life_expectancy  0.01562241   -0.21458422   0.5206328  0.44626758
##                        Food      Income Life_expectancy
## BMI_Index        0.07789474 -0.07565821      0.01562241
## Bloodpressure   -0.15094300 -0.18894575     -0.21458422
## Cholesterol      0.43995853  0.63484464      0.52063276
## Sugar            0.49576125  0.56375438      0.44626758
## Food             1.00000000  0.56796435      0.41764922
## Income           0.56796435  1.00000000      0.76416961
## Life_expectancy  0.41764922  0.76416961      1.00000000

The correlation matrix shows the relationships between the variables in data_eu. Life expectancy correlates strongly with income and moderately with cholesterol, sugar and food. The relationship between life expectancy and income is the strongest among the variables. Unexpected the systolic blood pressure has almost no impact on life expectancy (correlation = 0.02). This seems really unusual. High systolic blood pressure causes serious heart diseases and strokes which leads often to death and shortens life expectancy. cholesterol correlates moderately with income and sugar, and sugar correlates moderately with income and food.

The scatterplot matrix below shows the overview of all variables plots and the correlation between them.

Next I will look closer at scatterplots involving life expectancy with sugar and food. Thereafter the scatterplots income with the variables cholesterol. At the end the scatterplot sugar and cholesterol.

Life expectancy vs. Sugar

According to World Health Organization the daily intake of sugar should be less than 5% of the daily food intake. The daily food consumption is in average 2400 calories (women: 2100 calories, men: 2700 calories). That means that the daily sugar consumption is 120 calories. Which is equivalent to 31 grams per day.

The sugar consumption in Europe is above the recommended value. It is interesting to know which countries have a healthy sugar consumption, which have minimum and which have maximum value.

##           Region Country Year
## 1 Eastern Europe Armenia 1993
##            Region                Country Year
## 1  Eastern Europe                Armenia 1992
## 2  Eastern Europe                Armenia 1993
## 3  Eastern Europe                Armenia 1994
## 4  Eastern Europe             Azerbaijan 1993
## 5  Eastern Europe             Azerbaijan 1994
## 6  Eastern Europe             Azerbaijan 2000
## 7  Eastern Europe Bosnia and Herzegovina 1994
## 8  Eastern Europe Bosnia and Herzegovina 1995
## 9  Eastern Europe Bosnia and Herzegovina 1996
## 10 Eastern Europe                Georgia 1993
## 11 Eastern Europe                Georgia 1994
##           Region Country Year
## 1 Eastern Europe Estonia 2003
## 2 Eastern Europe Estonia 2004

Lowest sugar intake has Armenia in 1993.

Healthy sugar intake have the following countries:

The highest sugar intake has Estonia between 2003-2004.

All above countries are Eastern European countries.

Life expectancy vs. Food

It looks like that as higher the daily food intake is as older people get.

Life expectancy vs. Year, Region and Period

The median life expectancy increases until 1991. Then it decreases and after 1992 increases again.

Below are the media value of life expectancy in Eastern, Southern, Western and Northern Europe,

## [1] 70.8
## [1] 76.8
## [1] 76.65
## [1] 77.2

and the median value of life expectancy in periods: 1980 - 1989, 1990 - 1999, 2000 - 2004.

## [1] 74.7
## [1] 75.2
## [1] 75.95

The results shows that median life expectancy by regions is lowest in Eastern Europe, followed by Western, Southern and Northern Europe. Interesting is the life expectancy in Southern Europe, which is higher than Western Europe.

The median life expectancy by periods is lowest between 1980 and 1989. The median life expectancy increases by the following periods.

Cholesterol vs. Income

The cholesterol value increases as the income increases. This is true until the income get the value of 27500 $PPP then the cholesterol decreases as income increases. A possible reason to cholesterol fall is that people with higher income take medicine to lower it or that people with higher income try to reduce cholesterol intake.

Cholesterol vs. Sugar

The cholesterol values increase as sugar consumption increases. The turning point is the value of 120 grams sugar. After this the cholesterol decreases as sugar consumption increases. This can be explained by medicine intake against cholesterol.

##  [1] Country         Year            Gender          BMI_Index      
##  [5] Bloodpressure   Cholesterol     Sugar           Food           
##  [9] Income          Life_expectancy Region          Period         
## <0 rows> (or 0-length row.names)

None Country has normal cholesterol and healthy sugar consumption between 1980-1990.

##                   Country Year Gender BMI_Index Bloodpressure Cholesterol
## 1                 Armenia 1992 female  26.09322      134.0298    5.067329
## 2                 Armenia 1992   male  24.12982      134.9878    4.908696
## 3                 Armenia 1993 female  25.99631      133.6498    5.016787
## 4                 Armenia 1993   male  24.05854      134.5899    4.858117
## 5                 Armenia 1994   male  24.02297      134.2892    4.811986
## 6                 Armenia 1994 female  25.93440      133.3841    4.972368
## 7              Azerbaijan 1993   male  24.76250      133.1220    4.829249
## 8              Azerbaijan 1993 female  26.57865      130.3269    4.983414
## 9              Azerbaijan 1994 female  26.48774      129.9787    4.929545
## 10             Azerbaijan 1994   male  24.69113      132.6836    4.772017
## 11 Bosnia and Herzegovina 1994   male  25.09667      131.4639    4.663593
## 12 Bosnia and Herzegovina 1994 female  25.17425      132.7103    4.762538
## 13 Bosnia and Herzegovina 1995   male  25.07033      131.1335    4.630342
## 14 Bosnia and Herzegovina 1995 female  25.19614      132.6593    4.730610
## 15 Bosnia and Herzegovina 1996 female  25.28831      132.7398    4.718254
## 16 Bosnia and Herzegovina 1996   male  25.12328      131.0250    4.617823
## 17                Georgia 1993   male  24.82921      137.3146    4.881995
## 18                Georgia 1993 female  25.70626      132.9283    5.029236
## 19                Georgia 1994   male  24.73249      136.8331    4.814095
## 20                Georgia 1994 female  25.56143      132.5144    4.966557
##    Sugar    Food Income Life_expectancy         Region      Period
## 1  13.70 1833.98   1973            69.4 Eastern Europe 1990 - 1999
## 2  13.70 1833.98   1973            69.4 Eastern Europe 1990 - 1999
## 3   5.48 1868.46   1842            69.2 Eastern Europe 1990 - 1999
## 4   5.48 1868.46   1842            69.2 Eastern Europe 1990 - 1999
## 5  21.92 1945.81   1988            69.2 Eastern Europe 1990 - 1999
## 6  21.92 1945.81   1988            69.2 Eastern Europe 1990 - 1999
## 7  30.14 2217.92   4806            65.1 Eastern Europe 1990 - 1999
## 8  30.14 2217.92   4806            65.1 Eastern Europe 1990 - 1999
## 9  30.14 2100.15   3808            65.1 Eastern Europe 1990 - 1999
## 10 30.14 2100.15   3808            65.1 Eastern Europe 1990 - 1999
## 11 13.70 2724.51   1574            70.6 Eastern Europe 1990 - 1999
## 12 13.70 2724.51   1574            70.6 Eastern Europe 1990 - 1999
## 13 13.70 2737.26   1976            67.1 Eastern Europe 1990 - 1999
## 14 13.70 2737.26   1976            67.1 Eastern Europe 1990 - 1999
## 15 27.40 2851.18   3771            73.0 Eastern Europe 1990 - 1999
## 16 27.40 2851.18   3771            73.0 Eastern Europe 1990 - 1999
## 17 30.14 1569.77   2410            69.7 Eastern Europe 1990 - 1999
## 18 30.14 1569.77   2410            69.7 Eastern Europe 1990 - 1999
## 19 16.44 1776.61   2181            70.7 Eastern Europe 1990 - 1999
## 20 16.44 1776.61   2181            70.7 Eastern Europe 1990 - 1999

Only few countries in Eastern Europe have normal cholesterol values and healthy sugar consumption between 1990-99. These are:

  • Female and male in Armenia between 1992-1994
  • Female and male in Azerbaijan between 1993-1994
  • Female and male Bosnia and Herzegovina between 1994-1996 and
  • Female and male Georgia in 1993 and 1994
##      Country Year Gender BMI_Index Bloodpressure Cholesterol Sugar    Food
## 1 Azerbaijan 2000   male  24.51287      131.2280    4.534855  27.4 2406.22
## 2 Azerbaijan 2000 female  26.34661      128.5312    4.692858  27.4 2406.22
##   Income Life_expectancy         Region      Period
## 1   4459              68 Eastern Europe 2000 - 2004
## 2   4459              68 Eastern Europe 2000 - 2004

Only people in Azerbaijan in 2000 have normal cholesterol values and healthy sugar consumption.

BMI vs. Region

During 1980-2004 all European regions have in average BMI index above the normal. Lowest have the countries in Northern Europe, followed by Western, Eastern and Southern Europe.

BMI vs. Period

Since 1980 the mean and median of BMI index is increasing. Only between 1980-1989 the BMI in average normal. Red dot in boxplot denotes the mean value.

Food vs. Sugar

It is clear to see that the food and the sugar consumption in Europe is mostly above the recommended values.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Life_expectancy which is one of my main features correlates strongly with Income. The correlation between them is 0.7641696. Life_expectancy correlates with Cholesterol moderately and has almost no relationship with Bloodpressure, (correlation = -0.2145842). Income correlates moderately with Choresterol. The correlation between them is quite high, 0.6348446. The relationship between Choresterol and Sugar is moderate.

The correlation between Life_expectancy and Sugar or Life_expectancy with Food is also moderate.

Looking at the correlation matrix the BMI has a very weak correlation with all features. I expected a strong relation between them, since BMI categorize people as underweight, normal weight, overweight. That means BMI indicates in which health condition people are.

The median income and cholesterol increases in all periods and is lowest in Eastern Europe, followed by Southern, Western, and Northern Europe. The median life expectancy increased in all three periods and is lowest in Eastern Europe, followed by Western, Southern and Northern Europe.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

During 1980-2004 all regions in Europe have in average BMI above the normal. Lowest BMI the Northern Europe, followed by Western, Eastern and Southern Europe. Since 1980 the BMI index is increasing. Only between 1980-1989 the BMI in average is normal. The daily food and sugar consumption per person in Europe is over the recommended values.

What was the strongest relationship you found?

The strongest relationship I found is between Life_expectancy and Income. Both are main features.

Multivariate Plots Section

Income vs. Life Expectancy

The life expectancy in Europe increased. People with high income live longer than people with low income. Small changes in low income results to big changes in life expectancy. But changes to high income do not have a high impact on live expectancy.

Life expectancy in Southern, Western and Northern Europe increases in different periods. Interesting is the increase of life expectancy in Southern Europe, even when the income is not as high as in Western and Northern Europe and it has many outliers. In Eastern Europe median life expectancy decreased between 1990-1999 and the difference between minimum and maximum life expectancy increased.

More information about the development of income and life expectancy in Europe in different years are shown in graph below.

Generally life expectancy in most European countries increased during the analyzed time period. In some Eastern European countries life expectancy decreased slightly. For example in Russia, even when the income increased since 1998 life expectancy decreased.

In Ukraine, Moldova, Romania, Macedonia (FYR), the income decreased but the life expectancy stayed almost at the same levels or decreased slightly between 1980-2004. The decrease of life expectancy in Bosnia Herzegovina can be explained by the war in the begging of ’90s.

Eastern European countries have the lowest income, followed by Southern European countries.

Life Expectancy vs. Cholesterol

Median life expectancy in Western, Northern and Southern Europe increased. The range of the cholesterol values for these countries decreases. In Eastern Europe median life expectancy decreases between 1990-1999 but increased in the following years. Turkey is the country in Southern Europe with many outliers.

In most European countries the life expectancy increases when cholesterol decreases. This is valid also for both genders. Only in few Eastern countries such as Belarus, Russia and Ukraine life expectancy decreases when the cholesterol decreases.

##        Country        Year       Gender     BMI_Index     Bloodpressure  
##  Russia    :26   1992   : 2   female:13   Min.   :24.95   Min.   :128.4  
##  Albania   : 0   1993   : 2   male  :13   1st Qu.:24.99   1st Qu.:129.0  
##  Armenia   : 0   1994   : 2               Median :25.92   Median :131.0  
##  Austria   : 0   1995   : 2               Mean   :25.79   Mean   :130.4  
##  Azerbaijan: 0   1996   : 2               3rd Qu.:26.48   3rd Qu.:131.4  
##  Belarus   : 0   1997   : 2               Max.   :26.81   Max.   :133.0  
##  (Other)   : 0   (Other):14                                              
##   Cholesterol        Sugar             Food          Income     
##  Min.   :4.929   Min.   : 90.41   Min.   :2827   Min.   :11173  
##  1st Qu.:5.046   1st Qu.: 98.63   1st Qu.:2884   1st Qu.:11925  
##  Median :5.163   Median :106.85   Median :2926   Median :13173  
##  Mean   :5.172   Mean   :107.06   Mean   :2958   Mean   :13496  
##  3rd Qu.:5.269   3rd Qu.:117.81   3rd Qu.:3032   3rd Qu.:14629  
##  Max.   :5.496   Max.   :120.55   Max.   :3143   Max.   :16967  
##                                                                 
##  Life_expectancy             Region           Period  
##  Min.   :63.60   Eastern Europe :26   1980 - 1989: 0  
##  1st Qu.:64.90   Northern Europe: 0   1990 - 1999:16  
##  Median :65.20   Southern Europe: 0   2000 - 2004:10  
##  Mean   :65.64   Western Europe : 0                   
##  3rd Qu.:66.20                                        
##  Max.   :68.00                                        
## 

Until 1994 life expectancy in Russia decreased. Between 1994-1998 it increased and then it decreased again. During this period the cholesterol values were normal. It seems to me that the negative change of life expectancy has less with the cholesterol to do than with political, economical and social situation in the country during that period.

Even when women have in most countries higher cholesterol values than men, they have almost the same life expectancy.

Life expectancy vs. Region

##      Country         Year        Gender      BMI_Index     Bloodpressure  
##  Albania : 50   1993   : 40   female:319   Min.   :23.87   Min.   :128.4  
##  Bulgaria: 50   1994   : 40   male  :319   1st Qu.:25.04   1st Qu.:131.1  
##  Hungary : 50   1995   : 40                Median :25.44   Median :132.9  
##  Poland  : 50   1996   : 40                Mean   :25.47   Mean   :133.1  
##  Romania : 50   1997   : 40                3rd Qu.:25.89   3rd Qu.:135.2  
##  Armenia : 26   1998   : 40                Max.   :27.07   Max.   :139.1  
##  (Other) :362   (Other):398                                               
##   Cholesterol        Sugar             Food          Income     
##  Min.   :4.502   Min.   :  5.48   Min.   :1570   Min.   : 1466  
##  1st Qu.:4.924   1st Qu.: 60.27   1st Qu.:2727   1st Qu.: 5124  
##  Median :5.129   Median : 84.93   Median :2926   Median : 9794  
##  Mean   :5.082   Mean   : 86.29   Mean   :2921   Mean   : 9826  
##  3rd Qu.:5.273   3rd Qu.:111.64   3rd Qu.:3182   3rd Qu.:13705  
##  Max.   :5.554   Max.   :167.12   Max.   :3755   Max.   :25694  
##                                                                 
##  Life_expectancy             Region            Period   
##  Min.   :63.00   Eastern Europe :638   1980 - 1989:100  
##  1st Qu.:68.83   Northern Europe:  0   1990 - 1999:338  
##  Median :70.80   Southern Europe:  0   2000 - 2004:200  
##  Mean   :70.55   Western Europe :  0                    
##  3rd Qu.:72.80                                          
##  Max.   :76.80                                          
## 
##      Country        Year        Gender      BMI_Index     Bloodpressure  
##  Cyprus  :50   1980   : 14   female:175   Min.   :23.66   Min.   :123.0  
##  Greece  :50   1981   : 14   male  :175   1st Qu.:25.01   1st Qu.:127.5  
##  Italy   :50   1982   : 14                Median :25.57   Median :130.5  
##  Malta   :50   1983   : 14                Mean   :25.61   Mean   :130.5  
##  Portugal:50   1984   : 14                3rd Qu.:26.09   3rd Qu.:133.1  
##  Spain   :50   1985   : 14                Max.   :28.01   Max.   :138.1  
##  (Other) :50   (Other):266                                               
##   Cholesterol        Sugar             Food          Income     
##  Min.   :4.722   Min.   : 65.75   Min.   :2758   Min.   : 7828  
##  1st Qu.:5.249   1st Qu.: 76.71   1st Qu.:3217   1st Qu.:15854  
##  Median :5.358   Median : 84.93   Median :3405   Median :21442  
##  Mean   :5.324   Mean   : 93.78   Mean   :3361   Mean   :21549  
##  3rd Qu.:5.462   3rd Qu.: 95.20   3rd Qu.:3541   3rd Qu.:26226  
##  Max.   :6.127   Max.   :153.43   Max.   :3713   Max.   :36962  
##                                                                 
##  Life_expectancy             Region            Period   
##  Min.   :62.70   Eastern Europe :  0   1980 - 1989:140  
##  1st Qu.:74.50   Northern Europe:  0   1990 - 1999:140  
##  Median :76.80   Southern Europe:350   2000 - 2004: 70  
##  Mean   :75.78   Western Europe :  0                    
##  3rd Qu.:78.30                                          
##  Max.   :80.80                                          
## 
##         Country         Year        Gender      BMI_Index    
##  Austria    : 50   1980   : 16   female:200   Min.   :23.74  
##  Belgium    : 50   1981   : 16   male  :200   1st Qu.:24.67  
##  France     : 50   1982   : 16                Median :25.13  
##  Germany    : 50   1983   : 16                Mean   :25.20  
##  Ireland    : 50   1984   : 16                3rd Qu.:25.72  
##  Netherlands: 50   1985   : 16                Max.   :27.34  
##  (Other)    :100   (Other):304                               
##  Bloodpressure    Cholesterol        Sugar             Food     
##  Min.   :120.9   Min.   :5.303   Min.   : 90.41   Min.   :3094  
##  1st Qu.:128.5   1st Qu.:5.552   1st Qu.:109.59   1st Qu.:3316  
##  Median :132.0   Median :5.706   Median :117.81   Median :3433  
##  Mean   :131.7   Mean   :5.700   Mean   :120.10   Mean   :3439  
##  3rd Qu.:135.3   3rd Qu.:5.831   3rd Qu.:126.03   3rd Qu.:3569  
##  Max.   :140.0   Max.   :6.241   Max.   :164.38   Max.   :3817  
##                                                                 
##      Income      Life_expectancy             Region            Period   
##  Min.   :16078   Min.   :72.40   Eastern Europe :  0   1980 - 1989:160  
##  1st Qu.:26758   1st Qu.:75.30   Northern Europe:  0   1990 - 1999:160  
##  Median :31854   Median :76.65   Southern Europe:  0   2000 - 2004: 80  
##  Mean   :32403   Mean   :76.57   Western Europe :400                    
##  3rd Qu.:37425   3rd Qu.:78.00                                          
##  Max.   :49882   Max.   :81.00                                          
## 
##     Country        Year        Gender      BMI_Index     Bloodpressure  
##  Denmark:50   1980   : 10   female:125   Min.   :23.38   Min.   :120.0  
##  Finland:50   1981   : 10   male  :125   1st Qu.:24.68   1st Qu.:128.7  
##  Iceland:50   1982   : 10                Median :25.05   Median :132.4  
##  Norway :50   1983   : 10                Mean   :25.08   Mean   :131.8  
##  Sweden :50   1984   : 10                3rd Qu.:25.49   3rd Qu.:135.3  
##  Albania: 0   1985   : 10                Max.   :26.73   Max.   :143.1  
##  (Other): 0   (Other):190                                               
##   Cholesterol        Sugar             Food          Income     
##  Min.   :5.098   Min.   : 90.41   Min.   :2901   Min.   :21965  
##  1st Qu.:5.575   1st Qu.:112.33   1st Qu.:3089   1st Qu.:27789  
##  Median :5.752   Median :120.55   Median :3154   Median :32297  
##  Mean   :5.737   Mean   :126.09   Mean   :3169   Mean   :34303  
##  3rd Qu.:5.900   3rd Qu.:139.73   3rd Qu.:3250   3rd Qu.:37941  
##  Max.   :6.192   Max.   :164.38   Max.   :3458   Max.   :62370  
##                                                                 
##  Life_expectancy             Region            Period   
##  Min.   :73.70   Eastern Europe :  0   1980 - 1989:100  
##  1st Qu.:75.80   Northern Europe:250   1990 - 1999:100  
##  Median :77.20   Southern Europe:  0   2000 - 2004: 50  
##  Mean   :77.17   Western Europe :  0                    
##  3rd Qu.:78.50                                          
##  Max.   :81.10                                          
## 

Life expectancy in Eastern Europe is the lowest in Europe. Followed by Southern, Western and Northern Europe. Income in Eastern Europe is in average the lowest as well the cholesterol values.

In Northern and Western Europe the income is very high as well the cholesterol values. But still the people live longer.

In Southern Europe the income is lower than Western and Northern Europe but median life expectancy is higher as in Western Europe. It seems that not only income has an impact on life expectancy but also the country in which people live.

In the boxplot below more detailed information is given.

The plot verifies that the life expectancy in Eastern Europe is lowest in Europe.

## 
## Call:
## lm(formula = data_eu$Life_expectancy ~ data_eu$Income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3006 -1.7113  0.2188  1.8447  5.6592 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.878e+01  1.292e-01  532.42   <2e-16 ***
## data_eu$Income 2.486e-04  5.187e-06   47.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.611 on 1636 degrees of freedom
## Multiple R-squared:  0.584,  Adjusted R-squared:  0.5837 
## F-statistic:  2296 on 1 and 1636 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = data_eu$Life_expectancy ~ data_eu$Income + data_eu$Cholesterol)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4357 -1.7355  0.1785  1.8662  5.7525 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         6.546e+01  1.158e+00  56.517  < 2e-16 ***
## data_eu$Income      2.363e-04  6.699e-06  35.274  < 2e-16 ***
## data_eu$Cholesterol 6.666e-01  2.308e-01   2.888  0.00393 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.605 on 1635 degrees of freedom
## Multiple R-squared:  0.5861, Adjusted R-squared:  0.5856 
## F-statistic:  1157 on 2 and 1635 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = data_eu$Life_expectancy ~ data_eu$Income + data_eu$Cholesterol + 
##     data_eu$Country)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6002 -0.4134  0.0395  0.4255  4.7727 
## 
## Coefficients:
##                                         Estimate Std. Error t value
## (Intercept)                            9.016e+01  1.139e+00  79.174
## data_eu$Income                         1.862e-04  7.958e-06  23.398
## data_eu$Cholesterol                   -3.603e+00  2.232e-01 -16.143
## data_eu$CountryArmenia                -2.775e+00  2.248e-01 -12.346
## data_eu$CountryAustria                 1.444e-01  3.893e-01   0.371
## data_eu$CountryAzerbaijan             -6.993e+00  2.270e-01 -30.809
## data_eu$CountryBelarus                -4.417e+00  2.313e-01 -19.093
## data_eu$CountryBelgium                 8.980e-01  4.113e-01   2.183
## data_eu$CountryBosnia and Herzegovina -1.141e+00  2.267e-01  -5.034
## data_eu$CountryBulgaria               -1.718e+00  2.065e-01  -8.322
## data_eu$CountryCroatia                -9.280e-02  2.482e-01  -0.374
## data_eu$CountryCyprus                  2.855e+00  2.977e-01   9.590
## data_eu$CountryDenmark                -4.196e-01  4.262e-01  -0.984
## data_eu$CountryEstonia                -3.960e+00  2.648e-01 -14.954
## data_eu$CountryFinland                 1.226e+00  3.833e-01   3.198
## data_eu$CountryFrance                  2.225e+00  3.900e-01   5.706
## data_eu$CountryGeorgia                -2.033e+00  2.264e-01  -8.984
## data_eu$CountryGermany                 6.726e-01  4.244e-01   1.585
## data_eu$CountryGreece                  1.758e+00  2.648e-01   6.641
## data_eu$CountryHungary                -4.186e+00  2.388e-01 -17.527
## data_eu$CountryIceland                 4.373e+00  4.258e-01  10.270
## data_eu$CountryIreland                 3.313e-01  3.593e-01   0.922
## data_eu$CountryItaly                   1.011e+00  3.403e-01   2.972
## data_eu$CountryKazakhstan             -1.030e+01  2.253e-01 -45.708
## data_eu$CountryLatvia                 -4.202e+00  2.468e-01 -17.026
## data_eu$CountryMacedonia, FYR         -1.751e-01  2.229e-01  -0.785
## data_eu$CountryMalta                   4.249e+00  3.064e-01  13.866
## data_eu$CountryMoldova                -4.871e+00  2.242e-01 -21.728
## data_eu$CountryNetherlands             1.167e+00  4.019e-01   2.903
## data_eu$CountryNorway                 -1.185e+00  5.081e-01  -2.332
## data_eu$CountryPoland                 -1.252e+00  2.216e-01  -5.651
## data_eu$CountryPortugal                3.815e-03  2.730e-01   0.014
## data_eu$CountryRomania                -3.784e+00  2.054e-01 -18.424
## data_eu$CountryRussia                 -8.398e+00  2.477e-01 -33.903
## data_eu$CountrySlovak Republic        -1.055e+00  2.618e-01  -4.029
## data_eu$CountrySlovenia                4.123e-01  2.942e-01   1.402
## data_eu$CountrySpain                   2.400e+00  3.057e-01   7.849
## data_eu$CountrySweden                  2.464e+00  3.893e-01   6.327
## data_eu$CountrySwitzerland             7.660e-02  4.869e-01   0.157
## data_eu$CountryTurkey                 -6.762e+00  1.862e-01 -36.312
## data_eu$CountryUkraine                -5.620e+00  2.221e-01 -25.309
## data_eu$CountryUnited Kingdom          1.841e+00  3.898e-01   4.723
##                                       Pr(>|t|)    
## (Intercept)                            < 2e-16 ***
## data_eu$Income                         < 2e-16 ***
## data_eu$Cholesterol                    < 2e-16 ***
## data_eu$CountryArmenia                 < 2e-16 ***
## data_eu$CountryAustria                 0.71071    
## data_eu$CountryAzerbaijan              < 2e-16 ***
## data_eu$CountryBelarus                 < 2e-16 ***
## data_eu$CountryBelgium                 0.02916 *  
## data_eu$CountryBosnia and Herzegovina 5.36e-07 ***
## data_eu$CountryBulgaria                < 2e-16 ***
## data_eu$CountryCroatia                 0.70849    
## data_eu$CountryCyprus                  < 2e-16 ***
## data_eu$CountryDenmark                 0.32504    
## data_eu$CountryEstonia                 < 2e-16 ***
## data_eu$CountryFinland                 0.00141 ** 
## data_eu$CountryFrance                 1.38e-08 ***
## data_eu$CountryGeorgia                 < 2e-16 ***
## data_eu$CountryGermany                 0.11319    
## data_eu$CountryGreece                 4.27e-11 ***
## data_eu$CountryHungary                 < 2e-16 ***
## data_eu$CountryIceland                 < 2e-16 ***
## data_eu$CountryIreland                 0.35667    
## data_eu$CountryItaly                   0.00300 ** 
## data_eu$CountryKazakhstan              < 2e-16 ***
## data_eu$CountryLatvia                  < 2e-16 ***
## data_eu$CountryMacedonia, FYR          0.43228    
## data_eu$CountryMalta                   < 2e-16 ***
## data_eu$CountryMoldova                 < 2e-16 ***
## data_eu$CountryNetherlands             0.00375 ** 
## data_eu$CountryNorway                  0.01981 *  
## data_eu$CountryPoland                 1.89e-08 ***
## data_eu$CountryPortugal                0.98885    
## data_eu$CountryRomania                 < 2e-16 ***
## data_eu$CountryRussia                  < 2e-16 ***
## data_eu$CountrySlovak Republic        5.86e-05 ***
## data_eu$CountrySlovenia                0.16123    
## data_eu$CountrySpain                  7.61e-15 ***
## data_eu$CountrySweden                 3.23e-10 ***
## data_eu$CountrySwitzerland             0.87501    
## data_eu$CountryTurkey                  < 2e-16 ***
## data_eu$CountryUkraine                 < 2e-16 ***
## data_eu$CountryUnited Kingdom         2.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9121 on 1596 degrees of freedom
## Multiple R-squared:  0.9505, Adjusted R-squared:  0.9492 
## F-statistic: 746.8 on 41 and 1596 DF,  p-value: < 2.2e-16

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The relationship between life expectancy and income is strong. Having high income increases also life expectancy. But an important factor is also in which country people live. If people live in countries with sophisticated health system they live longer even when their cholesterol values are above the normal and the income is not so high.

Were there any interesting or surprising interactions between features?

Women have almost the same life expectancy as men even when they have higher cholesterol values than men.

Life expectancy in Eastern European countries is lower than the other European regions. The highest life expectancy is in Northern Europe, followed by Southern Europe and Western Europe.

Interesting are the results of Southern Europe. Even when people’s income is lower than in Western and Northern European countries they still have a high life expectancy.

Country is a variable which plays an important role in life expectancy.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Yes, I did. I created a linear model using the variables Life_expectancy and Income. Into the linear model I added the variables Cholesterol and Country. This model gave a very high R^2 value, equal to 0.95.

The addition of Cholesterol variable into the model improved the R^2 value only 0.002, (R^2 = 0.586). Adding the variable Country into the model the R^2 increased to 0.95.

The result above can interpreted as:

  • 56.4% of variation in life expectancy in Europe depends on income.
  • 56.6% of variation in life expectancy in Europe depends on income and on cholesterol value.
  • 95% of variation in life expectancy in Europe depends on which country people live, on and their income and on their cholesterol values.

Final Plots and Summary

Plot One

Description One

The distribution of life expectancy in Europe between 1980-2004 is negative skewed. The mean value is 74.15 and median value 75. Life expectancy of 76 years has the highest frequency.

Plot Two

Description Two

The scatterplot shows the relationship between life expectancy and income. The smooth curve increases strongly between 10000 and 23750 $ PPP. This could be interpreted as high income prolongs life especially until a life expectancy of 76 years. Thereafter life expectancy grows much slower than the increase of income.

Plot Three

Description Three

The life expectancy depends not only on income as it has been shown before, but also on the country in which people are living. Low income countries have lower life expectancy. The Eastern European countries and Turkey are the countries in Europe with the lowest life expectancy. Countries in Southern Europe have relatively high life expectancy compared to the income.

Reflection

To examine the life expectancy in Europe I created the data set data_eu. Which is a collection of data sets taken from Gapminder. I explored the Life_expectancy across the main features Cholesterol, Bloodpressure and Income. But also across other features such as Sugar, Food, Country, Region, Period and Gender.

Using scatterplots I found interesting relations between life expectancy, income and cholesterol in different European regions. Median Income and Cholesterol were lowest in Eastern Europe, followed by Southern, Western and Northern Europe, which had the highest values. For median Life_expectancy the order is different. Eastern Europe had the lowest value, followed by Western, Southern and Northern Europe. Southern Europe has a relatively higher Life_expectancy than their median Income and Cholesterol values would suggest. The Mediterranean diet might be a reason for that.

I was surprised that systolic blood pressure had no impact on life expectancy as I assumed. Perhaps medical treatment of high systolic blood pressure is a reason for that.

I created a linear model based on Income, Cholesterol and Country which gave a very high R^2. Surprisingly the variable Cholesterol had a weak impact on the prediction model of only two tenth of a percent.

In some cases I had to take the rounded value of Cholesterol,Sugar,Food and Bloodpressure in order to see a clear pattern.

The data set contains data for forty countries from 1980 until 2004. Most Eastern European countries delivered information after 1992. It would be interesting to have newer data to see more recent developments.

I believe that life expectancy depends not only on income, cholesterol and country. Especially the feature country should be taken a closer look at. It gives a combined value for many factors. Such factors are for example mortality, environment disaster, epidemic diseases, total health spending by country and gender. These features should be taken in account for a deeper investigation.